Matrix sketching for supervised classification with imbalanced classes
نویسندگان
چکیده
Abstract The presence of imbalanced classes is more and common in practical applications it known to heavily compromise the learning process. In this paper we propose a new method aimed at addressing issue binary supervised classification. Re-balancing class sizes has turned out be fruitful strategy overcome problem. Our proposal performs re-balancing through matrix sketching. Matrix sketching recently developed data compression technique that characterized by property preserving most linear information present data. Such guaranteed Johnson-Lindenstrauss’ Lemma (1984) allows embed an n -dimensional space into reduced one without distorting, within $$\epsilon $$ ? -size interval, distances between any pair points. We use as alternative standard strategies are based on random under-sampling majority or over-sampling minority one. assess properties our when combined with discriminant analysis (LDA), classification trees (C4.5) Support Vector Machines (SVM) simulated real Results show can represent sound widely used rebalancing methods.
منابع مشابه
Scalable Semi-Supervised Query Classification Using Matrix Sketching
The enormous scale of unlabeled text available today necessitates scalable schemes for representation learning in language processing. For instance, in this paper we are interested in classifying the intent of a user query. While our labeled data is quite limited, we have access to virtually an unlimited amount of unlabeled queries, which could be used to induce useful representations: for inst...
متن کاملSemi-Supervised Learning for Imbalanced Sentiment Classification
Various semi-supervised learning methods have been proposed recently to solve the long-standing shortage problem of manually labeled data in sentiment classification. However, most existing studies assume the balance between negative and positive samples in both the labeled and unlabeled data, which may not be true in reality. In this paper, we investigate a more common case of semi-supervised ...
متن کاملStreaming Classification with Emerging New Class by Class Matrix Sketching
Streaming classification with emerging new class is an important problem of great research challenge and practical value. In many real applications, the task often needs to handle large matrices issues such as textual data in the bag-ofwords model and large-scale image analysis. However, the methodologies and approaches adopted by the existing solutions, most of which involve massive distance c...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data Mining and Knowledge Discovery
سال: 2021
ISSN: ['1573-756X', '1384-5810']
DOI: https://doi.org/10.1007/s10618-021-00791-3